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BACKGROUND OF THE INVENTION 
1. Field of the Invention 

This invention pertains generally to bandwidth reservation within multipath 
; |J networks, and more particularly to a method of limiting and maintaining the reservation 
1W state in network routers. 
/I 2. Description of the Background Art 

Real-time multimedia applications require strict delay and bandwidth guarantees. 
q A network can provide such deterministic guarantees to an application only if it reserves 
O the required bandwidth and other necessary resources. Based on this reservation 
1fH* paradigm, the "Internet Engineering Task Force" (IETF) developed the Integrated 
Se/v/ces''(lntserv) architecture and the RSVP signaling protocol. A major concern, 
however, with the Intserv/RSVP architecture is that the soft-state mechanism it utilizes 
to maintain the consistency of reservation state may not be scalable to high-speed 
backbone networks. In response to a sufficiently large number of flows, the refresh 
20 messages, in addition to consuming memory, processing power, and bandwidth, can 
experience significant queuing delays which can precipitate failures of the soft-state 
mechanism. For the refresh mechanism to properly scale, the reservation state size 
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must either be eliminated or drastically reduced. The Intserv architecture was directed 
to providing such deterministic delay guarantees to applications that require it. In 
Intserv, the network reserves the required link bandwidth for each application, and then 
uses fair scheduling algorithms, such as WFQ, to ensure that each application receives 

5 its allotted bandwidth. Routers in the Intserv model must therefore remember the 
reservations for each flow and service each flow according to its reservation. 

As the links in backbone networks reach gigabit capacities, routers are expected 
to carry large number of flows, and the question arises as to whether the routers will be 

ri capable of scheduling the packets in a timely manner. For instance, if v is the number 

1E)Q of flows passing through a link, sorted-priority schedulers require o(log(v)) instructions 
to make a per-packet scheduling decision. While scalability of link scheduling is a major 

Ly 

\M concern, a more serious problem is related to maintaining the consistency of 

reservations in the presence of resource failures and control message loss. If resource 
^ reservations understate the actual reservations, delays cannot be guaranteed. 

1ETf However, resources are wasted when resource reservations overstate actual 

reservations. To implement robust mechanisms for maintaining resource reservations, 
the IETF proposed the RSVP which uses soft-state refreshing to maintain the 
reservation state. As a result of a large number of flows in the backbone, the volume of 
refresh messages can be sufficient to create delays and packet losses in response to 

20 the congestion. Refresh messages are time-sensitive and such delays and losses can 
easily destabilize the refresh mechanism. To deliver refresh messages in bounded 
time, the state size is preferably bounded. Therefore, scheduling and soft-state 
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refreshing are two components of the Intserv architecture, among others, that would 
benefit from scalable solutions. 

Certain schedulers which are based on a framing strategy can perform 
scheduling decisions in 0(1) , but provide looser delay bounds. Scalable solutions to 

soft-state reservations, however, are not as easily forthcoming, due to additional issues 
such as packet classification, and QoS path selection that require scalable solutions for 
successful implementation of Intserv. The packet classification problem, the QoS 
routing problem, and the solutions thereof are becoming well known in the industry, and 
are therefore not described herein. 

It may be appear that with current high-speed processors and inexpensive 
memory, the Intserv architecture and the associated RSVP can be implemented using 
per-flow processing. It will be appreciated, however, that the main concern is that the 
size of the reservation state and refresh message overhead are determined in response 
to the number of flows. When the volume of refresh messages is high, the effect of 
queuing delays due to congestion cannot be ignored, even when the refresh messages 
are forwarded with highest priority. It should be considered that, when refresh 
messages are delayed, the flows can lose their reservations. Delayed refresh 
messages can create a cascade effect wherein additional messages become delayed 
downstream. To prevent this situation, the refresh messages themselves should be 
delivered within a bounded time. This, however, is impossible if the bandwidth 
requirements for the refresh messages are unknown, as is the case in per-flow 
management. Accordingly, it is highly desirable that the reservation state depend only 
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on network parameters, such as number of nodes, links, and classes, rather than the 
behavioral patterns of the end users. Adopting such a paradigm provides network 
designers with additional leverage to bound the bandwidth requirement of refresh 
messages, and to allot them a fair share of the existing bandwidth, perhaps treating 
5 them as another real-time flow at the links. 

One approach to providing a scalable Intserv solution is to eliminate the per-flow 
reservation state in the core routers and follow a stateless approach similar to Diffserv. 
The SCORE architecture represents this approach to providing deterministic guarantees 
q without per-flow state management in the core. SCORE moves the per-flow reservation 
1 OB state from the routers into the packets of the flows. Each packet of a flow carries the 
jJl reservation and other dynamic state information that is required for scheduling. The 
: % reservation state in the packets is utilized by the core-routers to estimate the aggregate 
reservation on the links. There are no explicit refresh messages and thus the problems 
ii associated with lost or delayed refresh messages are greatly diminished. However, on 
133 closer inspection, it should be appreciated that the estimation algorithms are heavily 
driven by individual flow behavior. For instance, flows are required to send "dummy 
packets" when their rate falls below a threshold to prevent errors from occurring within 
the estimation algorithms that would result in inefficient utilization of network bandwidth. 
In addition, this approach does not particularly reduce the processing or bandwidth 
20 overhead needed for reservation maintenance, which is a major concern with RSVP. It 
appears, therefore, that SCORE is an attractive but partial solution, because 
mechanisms within the routers are heavily dependent on end-user behavior. 
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Therefore, a need exists for providing strict delay and bandwidth guarantees 
within a scalable architecture based on network parameters. The present invention 
satisfies those needs, as well as others, and overcomes the deficiencies of previously 
developed architectures. 

BRIEF SUMMARY OF THE INVENTION 

The present invention pertains to a family of architectures in which the per-flow 
reservation state in the routers is replaced with a small bounded aggregate state. The 
size of the aggregate state and the complexity of the associated refresh mechanism is 
determined by the network parameters, such as size and classes, rather than the 
number of end-user flows. This enables design of a robust refresh mechanism in which 
refresh messages never experience unbounded queuing delays. The architectures are 
scalable and provide similar delays to the Intserv architecture. The invention can be 
viewed as a middle ground alternative between the stateful Intserv and our recently 
developed stateless architecture SCORE. 

The invention includes a number of aspects, such as a shaper-battery comprising 
a set of token-buckets arranged in the form of a tree for aggregating network flows into 
classes. In addition, a burst-drain-time or burst-ratio is utilized for aggregating flows. 
Furthermore, a reservation maintenance protocol referred to as "AGgregate 
REservation Establishment protocol" (AGREE), is provided to manage consistency of 
aggregate reservations. AGREE is the first reservation protocol that uses diffusing 
computations to maintain consistency of the reservations. Based on these flow 
aggregation techniques and reservation protocol, the invention also comprises 
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architectures (GSAI) that support real-time applications in large high-speed networks. 

The present invention is based on the paradigm of that Intserv solutions can be 
scalable if the router state size and all related algorithms are solely governed by the 
network parameters and not the end-user flows. It is believed that this approach should 
yield a number of merits since network parameters are more stable than user behavior, 
wherein architectures based solely on network parameters should therefore provide 
improved stability and reliability. The present invention provides techniques that replace 
the per-flow state and per-flow processing with mechanisms whose complexity is 
essentially determined by the network parameters. These techniques echo the principle 
reason behind the scalability of the current Internet architecture, wherein the routing 
state is a function of the number of nodes in the network. 

The present invention attempts to provide an Intserv solution in which the 
reservation state size and the complexity of the refresh algorithms are independent of 
the number of individual flows. This is achieved by replacing the full state of Intserv with 
a much smaller state that is static and can be determined a priori from the network 
structure. The key to such a reduction is flow aggregation, in which large numbers of 
flows are merged into a small set of aggregated flows based on such criteria as class 
and destination. The core routers maintain state only for aggregated flows and process 
only aggregated flows. The aggregation is such that, by providing the guarantees to the 
aggregated flow, the guarantees of the individual flows within the aggregate are also 
guaranteed. The reservation state is drastically reduced and, more importantly, the 
state size arising out of these aggregation techniques is a function of the network 
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parameters, rather than the number of user flows, and thus is easily bounded. 
Architectural complexity within the present approach, however, remains substantially 
linear with respect to number of existing network nodes. 

The aggregation schemes described herein differ in a number of important 

5 regards from those currently being proposed. It will be appreciated that the aggregation 
technique proposed for RSVP is designed for aggregating reservation state of flows 
within a single multicast group. In contrast to that, the aggregation method proposed 
herein aggregates state of flows belonging to different multicast groups and as such are 

O orthogonal to aggregation within RSVP. 

101 Aggregating flows based on destination pair and providing bandwidth guarantees 

Q have been considered; however, the delay bounds offered in that proposal were not 
yi deterministic. In other typical proposed aggregation techniques, the computing delay 
^ bounds in a dynamic environment are not generally discussed. A system architecture 
!;* based on a fluid model provides intuition and illustrates many key ideas in flow 
15;^ aggregation, link scheduling, signaling and soft-state refresh mechanisms. In the next 
section, non-fluid architectures approximating the system architecture are presented. 

An object of the invention is to provide for the delivery of selected traffic over a 
given network which is subject to a predetermined maximum delay, and a bandwidth 
guarantee. 

20 Another object of the invention is to provide for the communication of selected 

multimedia, or other real-time data, over a network at a sufficient bandwidth to assure 
uninterrupted playback, and/or operation. 



UC00-303-2 



8 



EL645676937US 



Another object of the invention is to utilize reservation states within the routers 
having small bounded aggregate state. 

Another object of the invention is to provide a reliable and robust method for 
maintaining the reservation state within the routers. 

Another object of the invention is to provide a scalable architecture which 
generally yields the best attributes of both Intserv and SCORE. 

Another object of the invention is to provide an architecture in which the 
complexity of the associated refresh mechanism is determined by network parameters 
instead of flow parameters. 

Another object of the invention is to incorporate a shaper-battery for aggregating 
network flows. 

Another object of the invention is to provide a reservation maintenance protocol 
that utilizes soft-states, but refreshes state on a per-aggregate basis. 

Further objects and advantages of the invention will be brought out in the 
following portions of the specification, wherein the detailed description is for the purpose 
of fully disclosing preferred embodiments of the invention without placing limitations 
thereon. 

BRIEF DESCRIPTION OF THE DRAWINGS 
The invention will be more fully understood by reference to the following 
drawings which are for illustrative purposes only: 

FIG. 1 is a schematic of a token-bucket utilized to allow for specifying the input 

flow. 
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FIG. 2 is a schematic of traffic aggregation within a system architecture 
according to an embodiment of the present invention, showing aggregation of a 
particular destination. 

FIG. 3 illustrates pseudocode for handling refresh messages according with an 
5 aspect of the present invention, shown with two procedures, AGRA(), and DIFFCOMPQ. 
FIG. 4 is a schematic of a shaper-battery according to an aspect of the present 
invention, showing an arrangement of token-buckets arranged in the form of a tree. 
FIG. 5 is a graph of introduced delay in relation to bucket size for the shaper- 
□ battery shown in FIG. 4. 
1(P FIG. 6 is a schematic of a PKT-SP architecture according to an aspect of the 

! present invention, shown merging and shaping traffic for a particular destination. 
[5 FIG. 7 is a schematic of a packet distributor according to an aspect of the present 

U invention, shown distributing packets from a single flow into token buckets of three 
I ^ outgoing flows. 

1 FIG. 8 is pseudocode for a distributor according to FIG. 7, shown utilizing a 
weighted round-robin distribution algorithm. 

FIG. 9 is a schematic of the merging of path suffixes according to an aspect of 
the present invention. 

FIG. 10 is pseudocode fragment of soft-state refresh performed on a per-label, 
20 per-class, basis according to an aspect of the present invention. 

FIG. 1 1 is a schematic of a regulator according to an aspect of the present 
invention, shown shaping aggregate flows at a receiving end. 
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FIG. 12 is pseudocode for event handling within the AGREE protocol according 
to an aspect of the present invention. 

FIG. 13 is a graph of delay for packets of variable length in response to path 
length for a number of architectures, shown for an audio flow bandwidth range. 
5 FIG. 14 is a graph of delay for packets of variable length in response to path 

length for a number of architectures, shown for a video flow bandwidth range. 

FIG. 15 is a graph of delay for 100 byte packets in response to path length for a 
number of architectures, shown for an audio flow bandwidth range. 

FIG. 16 is a graph of delay for 100 byte packets in response to path length for a 
1 0S number of architectures, shown for a video flow bandwidth range. 
Li j FIG. 17 is a graph of delay for 300 byte packets in response to path length for a 

number of architectures, shown for an audio flow bandwidth range. 
O FIG. 1 8 is a graph of delay for 300 byte packets in response to path length for a 

y number of architectures, shown for a video flow bandwidth range. 
1£T" FIG. 19 is a graph comparing call-blocking rates under load according to an 

aspect of the present invention with that of SCORE and Intserv. 

FIG. 20 is a graph of state size in response to load according to an aspect of the 
present invention, showing the greater stability of label state size in relation to per-flow 
routing states. 

20 DETAILED DESCRIPTION OF THE INVENTION 

Referring more specifically to the drawings, for illustrative purposes the present 
invention is embodied in the apparatus and methods generally shown in FIG. 1 through 
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FIG. 20. It will be appreciated that the apparatus may vary as to configuration and as to 
details of the parts, and that the method may vary as to the specific steps and 
sequence, without departing from the basic concepts as disclosed herein. 
1. System architecture 
5 1.1. Flow Aggregation 

The well-known token-bucket parameters for input flow specification are (a , p) t 
where a is the maximum burst size of the flow and p is the average rate of the flow. 
Flow characteristics are enforced at the entrance using a token-bucket as exemplified 
within FIG. 1 . If the flow has a = 0, which is possible only in the fluid model, then the 
1 ft flow is called a 0-burst flow. 
Hi A principle concept in the system architecture is that traffic in the routers is 

in aggregated on a per-destination basis. The resource reservations are stored and 
|:* refreshed on a per-destination basis, rather than on a per-flow basis. Routers only 
\Z know the rates of incoming traffic on the links and the rates of outgoing traffic for each 
destination, they do not, however, maintain information on the rates of each flow. 

FIG. 2 illustrates 10 how traffic bound for a particular destination is aggregated at 
a node 12, whose routing is controlled by a routing table 14. Each flow entering the 
router is shaped to a 0-burst flow using a token-bucket with a bucket size set to zero, 
which is referred to as a 0-burst shaper. Flows 16 which originate at router node 12, 
20 and flows 18 which originate from neighboring nodes are shown being shaped upon 
entering router 12. Because of the fluid model, link schedulers do not introduce any 
jitter and thus flows arriving from the neighbors are 0-burst flows. Therefore, all flows 
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with the same destination can readily merge and the resulting aggregated flow 20 is a 0- 
burst flow. Each packet received by router 12 is forwarded to the respective outgoing 
link according to routing table 14. For simplicity, flows are always established along the 
single shortest-path from source to destination, a restriction that is relaxed in the 
5 architectures introduced in the next section. The routing table entry at router / for 
destination j is of the form (j , PB) , , s) , where J?J is the total rate of traffic for j , s 

is the next-hop on the shortest-path from / to j and PB) = { B l jJt \k e N* a Jc* s}. 

r3 When, a new flow with rate p and destination j is established through / , the 

%3 bandwidth B) is incremented by p . And when the flow is terminated, B) is 

1§4 decremented by p . Alternatively, reservations can be timed out instead of using an 

^ explicit tear down. The signaling and soft-state maintenance of B l j are described later 

q in the section. 

O 1.2. Link Scheduler 

^ Each destination-aggregated flow arriving at a link scheduler is a 0-burst flow and 

15 hence can be merged with flows of other destinations. The link scheduler of link (i, k) 
maintains only the total allocated bandwidth TB[ for real-time flows on that link which is 
equal to the sum of all B\ } for which k is the next-hop neighbor for j . The link 
scheduler employs weighted fair queuing to service the real-time flow at rate TB i k . 

There is no per-flow reservation information in the link scheduler. The flow specific 
20 information is maintained only at the entry router. The link admission test is simple and 
0(1) operation; a flow is admitted only if the available bandwidth C k - TB[ , where C[ , is 
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the capacity of link {i 9 k) , is greater than or equal to the rate of the flow. It is assumed 

for the sake of simplicity, that all bandwidth on the link is available for real-time flows, 
1.3. Reservation Maintenance 

Unlike RSVP, the approach of the present invention utilizes per-destination 
5 refresh messages instead of per-flow refresh messages. A refresh message specifies a 
destination and the bandwidth for that destination. Let T R be the refresh period, and let 
the refresh messages received for j in the previous refresh period specify a total 
m bandwidth of BT) , which is compared with B) and if BT] < B) , then a refresh message 
;fl is sent to next-hop s with bandwidth BT], thus releasing bandwidth B) - BT]. 
1Qy Otherwise, a refresh message with bandwidth B) is sent FIG. 3 exemplifies a set of 

pseudocode procedures for performing reservation maintenance, comprising a 
r;3 aggregation procedure 30, AGRA(); and a distribution procedure 40, DIFFCOMP(). 
rg The source of a flow sends its refresh message to the ingress node every 7^ 

seconds. At the ingress, all refresh messages of a particular destination are 
1 5 aggregated. When a flow terminates, the source stops sending the refresh messages 

and the bandwidth reserved for the flow is eventually timed out and released. During 

the establishment of flow when the signaling message of the flow arrives at / , B) is 

incremented and an implicit refresh message of that bandwidth is assumed to have 
arrived. For correct operation, the signaling message must spend no more than T R 
20 seconds at each hop. When a link fails, this information is propagated through the 
network by the routing protocol, such as OSPF. A source router utilizing the link to 
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support a level of flow immediately terminates that flow. When the network is stable, all 
overstated and understated reservations will be eventually corrected, provided the 
refresh messages are never lost. In each refresh period, at most O(n) refresh 

messages are sent on a link, regardless of the number of flows in the network. This 
model is scalable, in view of the fact that worst-case bounds on state size depend on 
the number of active destinations, rather than the number of individual flows. Given that 
the bandwidth requirements for refresh messages is known a priori, they can be 
serviced as a separate queue in the link scheduler, and thereby guarantee its bandwidth 
and bound the delays accordingly. Hence, refresh messages are never lost due to 
buffer flows as in RSVP; they are lost only due to link failures. In contrast, the number 
of refresh messages in RSVP is unbounded and depends on the number of flows in the 
network. In SCORE, reservation information is carried in the packets, so bandwidth 
provision is made implicitly when flows are established and as a result no reservation 
information being lost. 

Inconsistencies due to link failures are easy to correct, insofar as the routing 
algorithm informs the sources about link failures whereby the source nodes can 
terminate all flows that use a failed link and eventually all bandwidth used by those 
flows is timed out. The sources only need to remember the path utilized by each flow. 

In the presence of refresh message losses, the problem becomes increasingly 
difficult. When a refresh message is lost, the downstream node times out and releases 
a portion of the bandwidth. In the next cycle, when the refresh message is received 
correctly, the bandwidth to refresh is greater than the bandwidth reserved. In this 
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scenario diffusing computations are utilized herein to inform upstream nodes of the 
situation and ask them to release the required bandwidth. In fact, this same or a similar 
mechanism can be utilized to correct inconsistencies resulting from link failures. The 
diffusing computation operates as follows. When a node detects an inconsistency in the 
reservations, it must release upstream bandwidth. It thereby terminates as many flows 
as possible at the node to satisfy the bandwidth. If there is still some bandwidth to 
release, it distributes the bandwidth among upstream nodes that send traffic to this 
node. It then sends RELEASE messages to those upstream nodes, and enters a WAIT 
state while it pends on receiving ACK messages from the upstream nodes. If further 
RELEASE messages are received while it is pending in the WAIT state, it immediately 
sends back an ACK message. After all ACK messages are received, it transits to 
READY state. If the transition to WAIT state was triggered by a RELEASE message 
from the downstream message, it sends the ACK message to the downstream node. 

A reservation maintenance protocol referred to as "AGgregate REservation 
Establishment protocol" (AGREE), is provided to manage consistency of aggregate 
reservations. The correctness of AGRA can be argued informally as follows, wherein a 
formal proof is omitted for brevity. The refresh algorithm should be such that, after a 
sequence of link failures and refresh message losses, if no new flows are setup and 
terminated within a finite time, all reservations must reflect a consistent state, such as 
all future refresh periods, the refresh messages received by a node for a particular 
destination must be equal to the reservations made for that destination. All diffusing 
computations must terminate, because the topology stabilizes and the routing protocol 
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ensures that loop-free shortest paths are established for each destination within a finite 
time. Note that both the RELEASE messages of diffusing computations and the refresh 
mechanism in AGRA only decrease the allocated bandwidth, and bandwidths cannot 
decrease forever. Therefore, there can be no more diffusing computations after a finite 

5 time. At this time, the bandwidth specified by refresh messages for a particular 

bandwidth at all the nodes can only be less than or equal to the reserved bandwidth at 
that node, otherwise additional diffusing computations would be triggered. If the refresh 
messages, however, specify a lower bandwidth than the reserved bandwidth then that 

w bandwidth is released. Eventually, all reservations converge to a consistent state. In 
1 W spite of flow aggregation, delay guarantees can be provided for each individual flow. As 
a result of using a fluid model assumption, the delay experienced by a flow consists only 

U] of the waiting time at the 0-burst shaper at the ingress node and the propagation delays 




Q propagation delay of link (i, k) and P is the path of the flow. The state size is 0(N) . 
15 The refresh messages on a link are 0(N) . In a non-fluid model aggregating flows is 

not as simple as in the fluid model. The next section describes various techniques for 
merging non-fluid flows. 
2. Flow Classes 

2.1. Classes based on Packet Sizes (PKT classes) 
20 FIG. 4 illustrates a shaper-battery 50 to provide flow aggregation wherein 

incoming flows 52 are aggregated into outgoing L -burst flow 54. Four shapers 56a 
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through 56d receive and shape the incoming flow, which is aggregated through 
intermediate shapers 58a, 58b, and a final shaper 60 for final aggregation into flow 54. 
Recall that, in the system architecture, flows and aggregate flows are always shaped to 
O-burst flow before merging with other flows. Similarly, in the non-fluid architecture, 
flows and aggregate flows are shaped to the form (L,p) , where L is the maximum size 
of any packet of the flow and p is the rate of the flow or aggregate flow. This form of 
flow is herein referred to as an L -burst flow, for example the maximum burst is no more 
than the maximum packet size. 

Assume that there are Q classes and a packet size L g is associated with each 

class g . A flow belongs to class g if the maximum size of its packets is smaller than 
L g , but greater than L g _ x . In the routers, only flows that belong to the same class are 

merged and the link schedulers process aggregate flows that belong to one of the 
classes. By providing guarantees to the aggregate flow, the guarantees follow 
automatically for individual flows in the class aggregate. 

Even when flows belonging to the same class are merged, there is going to be 
burstiness, which must be removed when necessary by reshaping them to an L -burst 
flow. To study the delays caused by reshaping, consider the following scenario. If a 
flow / of the form (L g ,p f ) is serviced by a WFQ at a link, the delay is given by: 

hL+h+h^ + Tik (1) 

Pf Pf c 
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Assume that flow / is merged with n-1 other L g -burst flows, resulting in an 
aggregate flow with token-bucket parameters (nL g ,p g ) , where p g is the sum of rates of 

nL 

the participating flows. The delay bound offered by WFQ to the aggregate is then — L 

+ h^. + h™*L m However, this delay cannot be used to determine the end-to-end delay 

P* c 

bound for the flow / , because flow / may merge with different flows at different times 
in a dynamic environment. If the aggregate is first shaped to an L g -burst flow before 

reaching the link scheduler, then the aggregate flow has delay bound of ^ + + 

P% P% 



Because p f < p 9 flow / can then use the delay bound of + ^ss. + ^sl at 
C p f p f C 

the link in its computation of end-to-end delay bound. It should be appreciated that at 
this point delays only need be incorporated to shaping of the aggregate to an L -burst 

flow. For this purpose, a device referred to as a shaper-battery is introduced. 

It is assumed that the maximum number of flows n , that will ever be aggregated 
into one flow is known a priori. A straightforward approach for shaping a flow of form 

fag'Pg) t0 the flow of form ( L g>Pg) is to uti,ize a sin 9 le token-bucket with bucket size 

L and rate p . This introduces a delay in the token-bucket that is at most ^ ^ Lg , as 

P g 

shown in FIG. 5. Again, because p f < p , a bound of (n -1)Ljp f can be used to 
compute the end-to-end delay bound for a flow. Problems arise as n becomes 
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sufficiently large wherein this bound becomes very high and is not useful to the end 

user. The shaper-battery described below reduces this bound to — — - — . 

Pf 

A shaper-battery is a set of token-buckets arranged in the form of a tree, wherein 
each token-bucket or shaper has a fixed bucket size of L , but the rate is dynamically 

5 adjustable. A leaf shaper has one input while an internal shaper has two inputs. The 
rate of an internal shaper is the sum of the rates of its two children. The output of any 
_ shaper in the shaper-battery is an L s -burst flow. FIG. 4 shows a shaper-battery of 

'EW? 

f k Q height two which can shape up to four flows to L -burst flow. A shaper-battery of height 

!:;{ h can aggregate 2 h flows. The shaper-battery is always initialized such that the 

1 (fj buckets are set to L g before any flows are established through the battery. When flows 

are setup and terminated, the rates of the shapers are adjusted. For example, when a 
new flow of rate p f is established, one of the available leaf shapers of the shaper- 

h~ battery is assigned to the flow, the shaper's rate is set to p f and the rate of each of the 

buckets on the path to the root of the battery is incremented by p f . At any internal 

15 shaper the maximum delay experienced by a packet is bound by > , where p a 

Pf Pa 

is the rate of the shaper. The maximum delay that the packet of the flow faces in the 

hL 

shaper-battery is bound by — As an example, if 65536 L 9 -burst flows are merged 

Ps 

and shaped to an L -burst flow using a single token-bucket, the delay bound is as high 
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as 65535 x — . Using the shaper-battery, it is reduced to . 

P f P f 

The delay bound for the flow can be further reduced, if a minimum rate p min , for 
any real-time flow that is assumed. The flow experiences the worst case delay when it 
merges with traffic with bandwidth p min , on the other input of a shaper at every hop in 
the shaper-battery. At the n th stage of the shaper-battery, a flow faces a flow on the 
other input which has bandwidth of at least np min . The delay bound can be reduced to: 

t h 1 

— S— (2) 



where Y = . For a flow with rate p min , Eq. (2) reduces to — V h _ — !— . When h = 

L 



Pf Pf 



16, it is approximately equal to 3. 18 x — . This bound will be useful for giving workable 

1C! delay bounds for low bandwidth flows such as audio flows, 
u It is not required that all shapers of the battery be created upfront, the shapes 

battery can grow dynamically as new flows arrive. However, the limit on the maximum 
depth of the shaper-battery should be enforced in order to give the delay bound. Each 
bucket needs an input buffer of size L g . If h is the maximum height of a shaper-battery 

1 5 then it requires at most 2 M buffers of size L g . 

2.2. Classes based on Burst-Drain-Times (BDT Classes) 
In the previous section, flows are classified solely on the maximum size of their 
packets. Another method is now described for defining flow classes based the notion of 
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burst-drain-time (BDT). Given a flow A f with parameters (o- f ,p f ) , the burst-drain-time 



r f of the flow is the time to transmit one bucket at the rate of the flow, that is, r, = — - . 

P/ 

The flow A f can alternatively be specified as {r f ,p f y 

A remarkable property of flows specified using the BDT is that flows with the 
5 same BDT can be merged without changing the BDT of the resulting flows. The amount 
of traffic of flow / that arrives in an interval [r 9 t] for this flow is given by: 

A(v,t) < (<r f ,p{t-r)) (3) 

'■SB? 

5 A{r,t) < p f { Tf+ (t-r)) (4) 

y If two flows A l and A 2 with traffic profiles {t v p x ) and (t 2 ,p 2 ), specified using 

16- BDT, merge into a single aggregate flow and r t < r 2 , the amount of traffic that arrives 

P in an interval [r,t] for the aggregate flow A is given by: 

g A(z,t) < p^+it-z)) + p 2 (T 2 + {t-t)) (5) 

< (p l+ p 2 )( P ^ +P ^ + (t-r)) (6) 
I Pi + Pz J 

Z (a+a)(^2+('-0) (7) 
15 Eq. 7 states that burstiness of the resulting merged flow cannot be greater than the 
burstiness of the more bursty of the two input flows. Therefore, the resulting merged 
flow can be characterized by (r 2 , /?, + p 2 ) . The BDT parameter is used to define flow 
classes as follows: it will be assumed that there are Q real-time classes. With each 
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real-time class g a BDT R g is associated such that R g _ x < R g and Rq = 0. At the 
source, a flow with specification (t f ,p f ) is classified as belonging to class g if its burst- 
ratio z f is such that R g _ x < r f < R g . From Eq. 7 it follows that, if two flows belonging 
to the same class g are merged, then the resulting flow also belongs to the same class 
5 g. 

Though the preceding description of classes is based on the burst-size of the 
flow, we actually define classes based on the maximum packet size, of the flow 
=,u because we assume all flows are shaped down to single-packet size burst at the 
'/j: entrance before merging with other flows. That is, tine BDT of a flow / is defined by 

1fe — , where L f is the maximum packet size of flow / . 
% 3. GSM Architectures 

q This section presents a series of architectures based on the aggregation 

technique previously described. Architectures PKT-SP, PKT-MP and PKT-LS use the 
PKT classes, while BDT-DF, BDT-LS and BDT-MP use the BDT classes. The 
15 architectures employ various routing methods: single-path, multipath and label-switched 
paths. Table 1 serves as a quick reference. 
3.1. PKT-SP Architecture 

This section presents the first non-fluid architecture PKT-SP, specifically PKT 
classes with shortest path routing, which closely approximates the system architecture. 
20 Except for the non-fluid model assumption and PKT classes, the architecture is 
essentially the same as the system architecture. At the ingress router, packets are 
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marked with the flow's class tag. In the core routers, all packets of the same destination 
and class are aggregated. The main challenge is to remove bursts introduced due to 
the non-fluid nature of flows. FIG. 6 shows the schematic example of the PKT-SP 
architecture 70 utilized for merging and shaping traffic for a particular destination and 
5 class. After merging all flows 72 of a particular destination j and class g that 

originated at the router, they are shaped to an L g -burst flow using shaper-battery A 74. 

The output is then aggregated with incoming flows from neighbors 76 using a shaper- 
battery B 78, with flows of the same destination and class received on the incoming 
2 links. The output of shaper-battery B is an L g -burst flow when it reaches the link 

scheduler 80 which forward the packets. The link scheduler processes traffic on per- 
.1 destination, per-class basis, and there is one queue for each class-destination pair. 

Packet forwarding operations are performed in relation to routing table 82 which 
0 generates an aggregated outgoing flow 84. 

g Although the benefits of per-hop shaping can be realized in those proposals, 

1 ST such as reduction in buffer sizes, these benefits are largely undone by the per-flow 
traffic management that must be employed. 

The routing table is of the form (j,Bj 9 k) , where j is the destination, B) is the 

total rate of the traffic arriving at the node with that destination, and k is the next-hop for 
the destination. Assuming each queue is processed by the WFQ link scheduler at the 
20 cumulative rate reserved for the corresponding class-destination pair, the maximum 

2L L 

delay experienced by a packet at the link is bounded by — £ + . For a particular 

b j c l 
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flow / with rate p f and destination j , Bj changes with time though p f < . 

2L l 

Therefore, the delay bound — * + at a link can be used for the flow. Because 

delay bounds have a reciprocal relation to flow bandwidth, tighter delay bounds can be 
provided under aggregation, but because the delay bounds once offered to a flow at 

5 setup time must be valid throughout the lifetime of the flow, the delay bound that is a 
function of only the flow parameters must be offered. The jitter acquired by the traffic 
passing through a link is removed at the next hop by the shaper-battery B of the 

iJ2 corresponding destination. 

4 The rates of all shapers on the path of a flow are incremented at flow setup by 

1 0^ the rate of the flow and similarly decremented at flow teardown. The soft-state 
J ? maintenance of reservation bandwidth is the same as in the reference model FIG. 3. 
□ The admission tests take 0(l) time which is much simpler than using conventional 

)Z S admission tests which depend on the reservations already made to other flows and are 
generally complex. Let / be a flow with path P from i to j and rate p f and let h x 

15 represent the height of the shaper-battery A . The maximum delay experienced in the 

shaper-battery A is bounded by — . The maximum delay experienced in a shaper- 

Pf 

battery B of node i is —([log 2 (N t + 1)] + 1) . Let d ik be the delay of the link (i, k). 
Then the maximum delay experienced by this flow is bounded by: 

d ik = ^L + it + ^ + r, (8) 

Pf Pf c ik 
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Dp = M + ^ ^([^(^+1)]+^) (9) 

it will be appreciated that in dense networks, although the number of neighbors 
of a node can be high, the number of hops tend to be lower at the same time, effectively 
reducing the end-to-end delays. The above bound is not as tight as the delay bounds 
5 obtained using WFQ with per-flow processing. 

End-user applications can reduce their delays either by reducing the packet sizes 
or increasing the reserved bandwidth. Reserving extra bandwidth for each flow, 
□ however, tends to waste bandwidth. Instead, extra bandwidth can be allocated on per- 
;^ class basis. That is, if a queue, such as per-destination, per-label, or per-class, is 
10] added to a link scheduler, the queue is serviced at a minimum rate called the threshold 
\n rate. When a flow is added to a link and there is no other flow of the same class, a new 
u queue is added to the link scheduler which is processed at the threshold rate. If the 
H total rate for the class exceeds the threshold, the queue is serviced at that rate, 
; s *f otherwise the queue is serviced at the minimum threshold rate. This thresholding 
1 5 technique reduces delay bounds and is especially beneficial to "thin flows," which tend 
to have very loose bounds. 

By using the foregoing thresholding technique, the bound can be reduced for 
long paths. In thresholding, a queue is always serviced at a minimum rate. For shorter 
paths WFQ are still better, but the consequence of this should be evaluated with respect 
20 to the requirements of the applications. The present aim is not to minimize end-to-end 
delay bounds, but to provide delays within a "delay limit" that is sufficient to run most 
real-time applications. For instance, CCITT states that a one-way delay bound of one 
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hundred and fifty milliseconds (150ms) is sufficient for most voice and video 
applications. Giving tighter bounds than this would not particularly benefit the users 
because of the effects of human perception. Therefore, the present aim is to meet the 
bound of one hundred and fifty milliseconds (150ms). For shorter paths, the present 

5 approach meets that bound though they are looser compared to those of WFQ. Tight 
delay bounds are, however, useful to reduce the buffer requirements at the receiving 
router. Experiments will later be described which compare these delay bounds with 
those obtained from other methods. Classes can be introduced based on the packet 

;!! sizes. With each class a packet size is associated. At the link scheduler there is per- 
1CK. destination per-class scheduling. The shaper-batteries A and B are on per-destination 

SI per-class basis. 

m 3.2. PKT-MF Architecture 

^ The PKT-MP architecture extends PKT-SP to use multiple shortest paths that are 

S computed by OSPF and other recent routing protocols. This improves the use of link 
1 bandwidth and improves call-acceptance rates. The complexity of the router is not 
increased, though PKT-MP provides slightly looser delay bounds than PKT-SP. The 
key idea, however, is to illustrate how a single flow can use multiple paths for packet 
forwarding. 

Let S) be the set of next-hop choices at / that give the shortest distance to j . 
20 Packets received by i destined for node j are only forwarded to neighbors in S) . 
Because there can be more than one next-hop at a router for packet forwarding, 
bandwidth for each of the next-hops must be specified. Formally, let Bj be the total 
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rate of traffic of class g that router / receives, such as from hosts directly connected to 
it and from its neighbors, that are destined for router j . For each k e S) , let the B'^ 
specify the bandwidth that is forwarded to neighbor k . Let SB) = {B 1 ^ \k e Sj}. 
Assuming the network does not lose packets, B' ; = Y , ci B l k = Y , „ 5' .A 

5 routing table entry is of the form (j,g,B t j ig ,S' J ,SB'j ig } . When a router i receives a packet 
for router j it determines the next-hop k for this packet using a distributor to allocate 
packets to next-hops in proportion to their bandwidths. FIG. 7 depicts distribution 90 
ij! wherein a flow 92 is received by a distributor 94. The flows are distributed across a set 
H of three token buckets 96a, 96b, 96c. The router then puts the packet in the queue j of 
1 $j the link scheduler of the link to k . The time complexity of weighted round-robin 

: discipline used for determining the next hop by the distributor is constant because there 
q are fixed number of neighbors. FIG. 8 depicts an example of pseudocode for 
O performing the above method within a multipath packet forwarding procedure 100. A 
bounded jitter is introduced by the distributor which can easily be removed using a 
15 token-bucket shaper. The rate of the shaper k is given by B) k . Packets may arrive out 

of order at the destination because of multipaths; however, this is not a concern, 
because end-to-end delay bounds are provided, which can be utilized by the application 
to detect data losses. It should be appreciated that real-time applications are subject to 
a required playback time and packets need only arrive in any order within that time 
20 frame. 
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When a flow request of bandwidth p is made by an application at the source 
router / for destination j , the source router selects a valid path using the local link 
database along any shortest path from i to j that satisfies the bandwidth requirement. 
It then initiates hop-by-hop signaling to reserve resources along the selected path. If 
the signaling is successful, for each link (i,k) on the path, B) k and B) are incremented 
with p . Assume the source remembers the path along which the reservation for a flow 
is made. When a session is terminated, the ingress router initiates a flow teardown 
procedure. For each link (i, k) on the path, B) k and B) are decremented with p. As 

in the PKT-SP architecture, the reservations are maintained using soft-states. The 
pseudocode for soft-state refresh is the same as the one in FIG. 3 with line 8 modified 
to send messages to each k in Sj with appropriate bandwidth, instead of a single 

message. When BTj > B) , i needs to simply send a refresh message of bandwidth 

B) k to neighbor k. When BTj < B), a bandwidth of B) - BTj must be released. This 

excess bandwidth is distributed appropriately among the successors and refresh 
messages are sent accordingly. 

The delay introduced by the shaper of the distributor output can be bound by 

L 

, where p min is the minimum bandwidth of any real-time flow, which must be 

Pmin 

included in the end-to-end delay bound for the flow. Since the links have different 
bandwidths, the maximum delay of the paths should be chosen for computing end-to- 
end bound. Let 6) be the delay bound from node / to node j , wherein S) can be 
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recursively defined as: 



S* = + MAX {d ik + d^keS'} 



Pmin 



(10) 



where d ik is as Eq. 9. An end-to-end delay bound for the flow is given by: 




(11) 



The architecture in PKT-MP differs from the PKT-SP architecture in the use of 
multipaths. Ail flows with a particular destination j are aggregated and they collectively 
share the bandwidth allocated on links for j . A key benefit of establishing flows along 
the multiple paths is that the routing state is still O(N) , where N is the number of 

nodes in the network. The advantage of this approach is that the state in the routers is 
not increased when a new flow is established. The state size is defined by the network 
topology, rather than user flows. The number of refresh messages on a link is bounded 
by O(N). 

So far, only those neighbors that offer the shortest distance to the destination 
were considered for packet forwarding. However, if the neighbors that are equidistant 
but have "lower" address, are added to the 5) then better call-acceptance rates can be 

achieved. The routing paths will still be free of loops in this case. 

Note that PKT-MP uses only a class field that can be encoded in the DS field of 
the IP header and leave all other fields untouched. In contrast, SCORE reuses some 
fields in the IP header, which are described in more detail later. 
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3.3. PKT-LS Architecture 

In the PKT-SP and PKT-MP architectures, flows are always established along 
the shortest paths. To lower call-blocking rates, the architecture must allow the user to 
setup flows along any path between a source and a destination, as determined by a 
5 QoS path-selection algorithm. The challenge is to do this without using per-flow state. 
Described within the present invention is a method of using Multi-Protocol Label- 
Switching (MPLS) for setting up the flows along any arbitrary path while replacing per- 
flow state with a much smaller aggregated label state. Such techniques have been 
;^ previously proposed for Diffserv, but they have not been adequately applied in the 
ipB I ntserv context. 

UJ In a label-switching scheme, the packets are assigned labels which have specific 

interpretation in the receiving router. In the scheme, a label is used to uniquely 
Q represent a path between a pair of routers. It is based on the principle that, if two 
% packets received by a router have the same label, then they must follow the same path 
1 5 starting from the router to the destination. Packets with the same label may belong to 
different flows and may be received from different neighbors, but the routers process 
them identically based only on their label. Formally, let label v represent a path P from 
router i to destination j . Let k be the neighbor of i on the path P. Let u be the label 
of the path from k to j at k . Then for i to forward a packet with label v along P , all it 
20 needs to do is replace the label with u and hand it over to the neighbor k . To an 

application that wants to use path P , router / returns the label v, which the application 
must use to mark its packets. 



UC00-303-2 



31 



EL645676937US 



For example, in a node diagram 1 10 of FIG. 9, all packets that router B receives 
having the same label are forwarded along the same labeled switched path from B to 
A . A consequence of the path aggregation is that, if two flows with paths C B and 
D B share a common suffix subpath B A , only one routing table entry need be 
5 maintained for these two flows at each router in the suffix-path B A. In the same 
situation, flow setup that does not use aggregation, such as in virtual circuits, would 
create two entries in the routing table. 

The routers maintain state on per-label basis instead of per-destination. Though 
f per-label state is larger than per-destination state, it is significantly lower than per-flow 
1 bg state. Representative figures on state sizes are provided later in the description. 
\ ^ FIG. 10 depicts a fragment of pseudocode 120 as an example of soft-state 

refresh which is performed on per-label, per-class basis. An entry in the refresh 
q message carries a label and the bandwidth for that label. Let the refresh messages 
y received at /for label v specify a total bandwidth of BT V . In comparing B[ with BT V [ ; if 

15 Bt y is less than B[ it means that the router can release some allocated bandwidth 
equal to B\ - BT l v . The refresh messages for bandwidth BT V and label u are sent 
accordingly, to neighbor k associated with label v ; otherwise B u , is sent in the refresh 
message. 

The rest of the architecture is quite similar to the PKT SP architecture. PKT-LS 
20 can be viewed as a generalization of PKT-SP where the destination address is replaced 
with labels. In PKT-SP the link scheduler performs per-class per-destination 
scheduling, while in PKT-LS it processes flow aggregated on per-class per-destination 
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basis. The routing table stores the total bandwidth for that label. A routing table entry is 
of the form (v.g.B^k^ , where B[ g is the rate of the traffic arriving at the node with 

label v. At the ingress, all flows with the same label are merged using a shaper-battery; 

there is one shaper-battery per label. Similarly, in the core routers there is also one 
5 shaper-battery per label to shape incoming traffic from the neighbors. The flow 

establishment is the same as in PKT-SP, except that the flow receives a label from the 

ingress router-that it must use to mark its packets. The delay bound is the same as in 
q Eq. 9, where P is now a labeled path instead of the shortest path as in PKT-SP. 
a The architecture of PKT-LS is more efficient in terms of network bandwidth usage 

1G0 which is reflected in a higher call-acceptance rate. This gain comes at the cost of a 
% larger state in the routers. Because of the use of labels the total number of labels that 
^ have to be maintained at a router is greater than O(N) . However, the benefits of 

'% aggregation can be seen from the fact that the amount of label state is significantly less 
2 than per-flow state and only a few times larger than per-destination state and hence, 
15 resource provisioning to prevent refresh message loss can be incorporated. From 
experiments, it has been found that the size of the label state is only few times the 
number of nodes in the network. 
3.4. BDT-DF Architecture 

The BDT-DF architecture is similar to PKT-LS is all respects, except that it 
20 eliminates the use of the shaper-battery by using BDT classes instead of PKT classes. 
Each flow-is shaped to an (L f9 p f ) flow at the ingress node. The flow's BDT is 
determined at the ingress node and each packet of the flow is marked with that class. A 
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field in the packet specifies the class of the flow. At the links, all packets belonging to 
the same class are aggregated into a single flow, irrespective of the flow to which they 
belong. Thus, there are Q queues at each link serviced by a WFQ. There is no per- 
flow state maintained by the schedulers; only the total bandwidth currently allocated to 
5 each Q classes, is maintained for use by the admission control. The per-packet 
processing is a constant time operation. 

A routing table entry is of the form (v^.B^Xu) ■ When a packet with label v 
and class g is received, its label is used to determine the next hop neighbor k and the 
! f label v to substitute. The packet is placed in the queue g of the link to k . The rate for 
1C| class g on the link (i, k) is the sum of the rates of all labels for which k is the next 
IJ hop. Note that, unlike PKT-LS, there is only per-class queue in BDT-DF. 

The jitter introduced by the scheduler in the aggregate flow is removed at the 
; J next router by reconstructing it to have same the pattern it had before entering the link. 
S By reconstructing the aggregate flow, the flows comprising the aggregate can be 
15 extracted without destroying their class identity. At the next link, flows can then be 
merged with other flows of the same class. To shape traffic, conventional techniques 
may be employed, however, the shaping provided is for the aggregate flows. This is 
achieved using a regulator and delay-field in the packets as follows. The delay bound 
for a packet of class g with cumulative rate of p g at a link (i , k ) is given by: 

20 0 - x + hm. + km. + T (12) 

g n C 

Each packet is delayed by an arbitrary amount but never more than the maximum delay 
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6 . FIG. 1 1 depicts regulation 130 over a link 132 from router i 134, to router j 136. 
The regulator simply delays packets arriving early so that the total delay is 6 and the 
flow is restored to the pattern it had before entering the schedule. Let the aggregate 
flow entering the real-time queue be characterized by {r g9 p g ) - If 0 ' is the actual delay 

5 experienced by a packet in the scheduler, the regulator delays the packet by 0-6 \ The 
delay 0 ' is obtained from the time stamp when the packet enters the scheduler and the 
time when it is received by the regulator. Each packet emerging out of the regulator is 
therefore delayed by a total amount of 0 , and the output of the regulator has a 
5 characterization that is identical to the input to the scheduler, wherein the output of the 
1 €F1 regulator has class g . As a result of each individual flow belonging to class g before 
'fn aggregation, each individual flow contained in the aggregate flow conforms to g after 
u passing through the regulator. The flow contained in the output of the regulator can 
I ^ therefore be freely merged with other flows of the same class entering the node via 
other links. Current literature in the field describes the efficient implementation of 
1 5 regulators using calendar queues. 

It is important to clarify that we cannot simply reshape the traffic, using a token 
bucket, to conform to class g . A regulator should be utilized because flows with 
different labels are merged into same class queue at the next router, wherein packets 
merged into the same class may go to different routers. The aggregate flow must be 
20 reconstructed to have the same the pattern it had before entering the link. 
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At any link (i, k),a flow of class g packets experiences at most r + 



max + 



L, 



max 



+ r ik . Since p f < p gi the end-to-end delay bound for a flow with path P is as 



follows: 




max _|_ 



S 



L, 



ik 



5 The refresh mechanism is identical to one in PKT-LS. The architecture eliminates the 
use of shaper-batteries. The link-scheduler complexity is 0(Q) . The admission control 

v3 only has to ensure that the total allocated bandwidth for real-time does not exceed C or 
2 some policy determined limit. The delay-field is an added overhead that consumes 
bandwidth. The architecture in the next section shows how this field can be removed, 
iff 3.5. BDT-LS Architecture 

p Architecture BDT-DF uses the delay-field and the regulator to remove jitter 

O introduced by the link schedulers. The delay-field consumes extra bandwidth. The 
^ delay-field is primarily used by the regulator for "restoring" a flow to the form it had 
before entering the link at the previous hop. In BDT-LS, by contrast, the aggregated- 
15 flow is only "reshaped", using a token-bucket (vVAr) to belon 9 to class s instead of 

completely restoring. In BDT-DF just reshaping would not be useful, because the flows 
aggregated into a class can be going to different destinations and individual flows 
extracted may then not belong to the same class and cannot readily be mixed with other 
flows. For this reason, in BDT-DF a regulator is used with the help of a delay field to 
20 restore the flow to the original shape. In BDT-LS, instead of using a delay field and a 
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regulator, the link scheduler is expanded to process flows on a per-class, per-label, 
basis instead of per-class scheduling as in BDT-DF. It will be appreciated, therefore, 
that packets with same label and same class are aggregated into a queue at the link 
scheduler. There is one token-bucket per-label per-class for each incoming link with 
5 parameters (r g p Vig , p vg ) where z g p vg is the rate of traffic with label v and class g . The 

link scheduler processes at most QV queues, where V is the number of labels in the 
routing table. 

The refresh mechanism and refresh overload are same as in BDT-DF. The delay 
S bounds are also identical to those in BDT-DF. The refresh overload is again the same. 
1(E The advantage of BDT-LS is in eliminating the use of the delay field. The label 

Q and the class of the packets, however, are necessary. The obvious disadvantage is 
that the number of queues is increased from 0(Q) to 0(QV) . This should not be a 

;f matter of concern because the complexity is largely determined by the network 
□ parameters rather than user request count. 
15 3.6. BDT-MP Architecture 

It should be appreciated that BDT-MP is similar to PKT-MP except that it uses 
BDT classes instead of PKT classes. The use of a shaper-battery and the label field is 
eliminated using BDT-MP. 

A routing table entry is identical to that of PKT-MP. A distributor is used as in 
20 PKT-MP which works on per-destination per-class basis just as in PKT-MP. Like PKT- 
MP, BDT-MP uses no extra field except the class field. The soft-state mechanism is 
same as in PKT-MP. 
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The link scheduler processes packets aggregated, on the basis of destination 
and class just as in PKT-MP. The link scheduler processes at most QN queues. The 
jitter introduced by the link schedulers is removed using a token-bucket shaper as in 
BDT-LS. There is again no need for using a regulator because all packets aggregated 
5 into the same queue are destined to the same router and hence no delay field. 

However, the distributor introduces extra burst that is at most L max , which in PKT-MP 

was only L g . The jitter introduced by the distributor which is again removed using a 
token-bucket shaper with parameters ( r g B l jXg , Bj Xg ). The delay introduced by the 

''f, shaper of the distributor output can be bound by , where the p min , is the minimum 

{ Pmin 

ipfj bandwidth of any real-time flow. The end-to-end delay bound can be recursively 
defined as follows: 

3.7. AGREE Protocol 
The goal of the "AGgregate REservation Establishment" protocol (AGREE) is to 
1 5 maintain the consistency of reservations. If at each router i for all destinations j and 

classes g , ^ keS , B l hg k - B l . g k + r jg , then reservations are said to be in a 

consistent state. Pseudocode which exemplifies an embodiment of AGREE is shown in 
FIG. 12 as an AgreeEventQ procedure 150 and a DiffCompQ procedure 160. The 
AGREE protocol utilizes soft-states in a similar manner to RSVP and YESSIR, yet, 
20 because the reservation state is on a per-destination, per-class basis, its reservation 
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refresh messages are predicated on a per-destination per-class basis. Every T R 
seconds, which is set as the refresh time period for each destination j and g , the 
router / invokes AgreeEvent(TIMEOUT, j , g , -) for comparing the cumulative 
reservations of the incoming refresh messages with the current reservations and 
5 sending its own refresh messages. A refresh message specifies the destination j , 
class g and the associated bandwidth b . The source node of each flow sends its 
refresh messages to the ingress node every T R seconds, stating its destination, class 
and its rate. At the ingress node all refresh messages of a particular destination and 
;|j class are aggregated and a single refresh message is sent to the next-hop. When a 
1 (|i flow terminates, the source stops sending its refresh messages and the bandwidth 
W reserved for the flow is eventually timed out and released in the network. The core 

refresh cycle is shown on lines 02-13 of the pseudocode in FIG. 12. Let the reserved 
; bandwidth on the outgoing links in Sj for class g add up to bw. 
Q Let the refresh messages received by router i for destination j from neighbors 

15 not in 5} and refresh messages originating at the router, during the previous refresh 
period add up to a total bandwidth of bt . Note that the refresh messages originating at 
the router itself add up to P Lg . First, bt is compared with bw, and if bt = bw, the 
reservations are in a consistent state and there is no need to release bandwidth. The 
router simply sends a refresh message to each next-hop k e Sj , with current allocated 
20 bandwidth B) gk . 
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Reservations can become inconsistent, such as when bt * bw, due to flow 
terminations, link failures, control message losses, and similar occurrences. To correct 
the inconsistencies two separate cases are considered: (1) bt < bw, and (2) bt > bw. 
The first case is handled by lines 6-9 and 12 of the pseudocode in FIG. 12. The total 
5 incoming bandwidth bt is first divided into b^.^b^ ,such that for each keSj, b k < 

Bj 9gtk , and then for each k e Sj , B l jsk is updated with b k and a refresh message is sent 

to k with the new bandwidth b k . The second case, wherein bt > bw is generally more 

% difficult, and requires forcing the upstream routers to reduce their outgoing bandwidth, 
y Two methods are described for correcting this inconsistency. The first method uses the 

10y fact that the underlying routing protocol (such as OSPF) informs all routers about link 
111 failures. When a router learns about a failed link, it terminates all flows that use that 
i;* link. The soft-state refresh mechanism will then eventually release the bandwidth 
% reserved for these flows using the same process outlined to handle case (1 ). A router is 
\2 only required to remember the path of each flow that originates from it. 

15 The second method uses a diffusing computation to correct the inconsistencies. 

When the router / detects failure of adjacent link (i, k)\X invokes 
AgreeEvent{RELEASE, j, g, k, B) gk ) for each j and g. The router updates B l jgJc 

(line 15) and invokes DiffCompQ if it is in a PASSIVE state. The DiffCompQ procedure 
first terminates as many flows as possible at the router, and if there still exists 
20 bandwidth which should be released to restore consistency, the router distributes the 
excess bandwidth among upstream neighbors and requests them, using RELEASE 
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messages, to reduce sending required traffic to this router (lines 31-34). The router 
then enters ACTIVE state indicating that it is waiting for the upstream nodes to reply 
with ACK messages. When an upstream router receives a RELEASE message it 
repeats the same process. If a router receives RELEASE messages from successor 
5 nodes while in the ACTIVE state, it immediately sends back an ACK message (line 18). 
After all ACK messages are received, it transits to PASSIVE state (line 21 ) and if the 
transition to ACTIVE state was triggered by a RELEASE message from the downstream 
message, it sends the ACK message to the successor node that triggered the transition 
J to ACTIVE state (line 22). When flow-setup and terminate messages are received, they 
iCl are simply forwarded to the next hop after the reservations are modified. 
J During routing-table convergence, stray release messages may arrive from 

m current upstream nodes, which are safely ignored by immediately sending ACK 
% messages even when the router is in PASSIVE state. Similarly, the refresh messages 
h received from downstream nodes and duplicate refresh messages are ignored. When a 
15^ neighbor k is added or removed from a successor set, the corresponding Bj k are 

reset for each j and g . Although not explicitly stated in the pseudocode, it will be 
appreciated that before initiating the diffusing computation, an attempt may be made to 
reserve the required bandwidth through a new request wherein the diffusing 
computation is only triggered for execution subject to request failure. 
20 The AGREE protocol can be said to work correctly if after a sequence of link 

failures and refresh message losses, and if no new flows are setup and terminated, 
within a finite time all reservations reflect a consistent state. For correct functioning of 
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the protocol, it is assumed that messages on a link are received and processed in order. 
Abiding by this condition prevents race conditions between flow setup, terminate, 
refresh, and release messages. It will be appreciated that the topology stabilizes within 
a finite time and the routing protocol will ensure that loop-free shortest paths are 

5 established for each destination, wherein all diffusing computations terminate, and all 
routers return to PASSIVE state for each class-destination pair. In the AGREE protocol, 
the release messages and the refresh messages only operate to decrease the reserved 
bandwidths. Since bandwidths are not subject to continuous decrease forever, after a 

!; ;? finite time no new diffusing computations will be initiated, at which time the bandwidth 
1$j specified by refresh messages at all the nodes for a particular bandwidth can only be 
J less than or equal to the reserved bandwidth at that node, otherwise this will again 

l M trigger another diffusing computation. If on the contrary, refresh messages specify 

'•I 

^ lower bandwidth than reserved bandwidth, then that extra bandwidth is eventually 
S released by the usual timeout process of case (1 ). Therefore, the protocol assures that 
1 %1 eventually all reservations must converge to a consistent state. 

Up to 0(QN) refresh messages are sent on a link within each refresh period, 

irrespective of number of flows in the network. Since the bandwidth requirements for 
refresh messages is known a priori, they can be serviced through a separate queue in 
the link scheduler and guarantee a delay bound. So refresh messages are never lost 
20 due to queuing delays. This is not possible in per-flow architectures as the number of 
flows on a link cannot be determined a priori. In AGREE, they can only be lost due to 
link failures, which in backbone network is relatively rare. Even then, the AGREE 
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protocol is more resilient to refresh message loss compared to a per-flow architecture. 
In per-flow architectures a lost refresh message cannot be distinguished from flow 
termination and the router interprets a lost refresh message as a flow termination and 
attempts to release bandwidth from downstream links. In the following cycle when the 
refresh message is received correctly, it tries to recover the released bandwidth. In 
contrast, in AGREE, a link can simply use a null refresh message when it does not carry 
any traffic for a particular destination and class. This enables distinguishing flow 
termination from refresh message loss. When a periodic refresh message is lost, the 
receiving node recognizes it and continues to use the contents of the refresh message 
of the previous cycle. In the following cycle, if a refresh message is received correctly, 
the new refresh message is utilized. In essence, refresh messages are sent 
irrespective of the presence of flows in a synchronous manner which is only possible 
because AGREE's reservation state is based on network parameters. It should be 
appreciated that the described AGREE model and variations thereof provide scalability 
because the worst case bounds on state size depend on the number of active 
destinations and classes rather than the number of individual flows. 
4. Comparison of the architectures 

This section makes a comparative study of our architectures with the Intserv and 
SCORE architectures in terms of delay bounds, bandwidth utilization, and control 
overhead, obtained in the various architectures. 

4.1. Delay Bounds 

The end-to-end delay bound in PKT-SP and PKT-LS as given by Eq. 9 is loose 
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compared to the delay bound achieved via WFQ in per-flow architectures. This is 
mainly due to the delays introduced by the shaper-batteries. It has been illustrated how 
the delays can be improved using the described thresholding technique. Following are 
experiments which test the resulting delay bounds of the architectures. 

5 Consider a network in which a node has at most 31 neighbors, so that the shaper 

B can be restricted to have a depth of 5. Each of the links is assumed to have 1 Gbs 
capacity. For simplicity, propagation delays are ignored in all the experiments. The 
packet sizes can range from 100 bytes to 1000 bytes and flow bandwidths range from 

Q 64 Kbs (audio) to 3 Mbs (video). The threshold is set at 5 times 64 Kbs. Assume 64 
1 ^ Kbs is the minimum bandwidth of any flow. The threshold rate used is 10 times the rate 

■ j for an audio flow. The delay-bounds are plotted for various architectures as function of 

in path length. 

M FIG. 13 and FIG. 14 represent a comparison of delays for variable length packets 

Jf in response to path length associated. FIG. 13 illustrates a graph of variable length 
1 5H packets subject to an audio bandwidth of 64 Kbs, while FIG. 14 illustrates a video 
bandwidth of 3 Mbs. 

It will be appreciated that high bandwidth applications, such as shown in FIG. 16 
and FIG. 18, are less prone to lengthy delay limits and more apt to meet the delay limit 
of one hundred and fifty milliseconds (150ms). This is because the relation between 
20 end-to-end delay and bandwidth is reciprocal. As mentioned earlier, CCITT states that 
a one hundred and fifty milliseconds (150ms) end-to-end delay is sufficient to support 
most real-time applications. This assumption goes forward with the focus on meeting 
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this limit of 150ms rather than beat the tight bound given by WFQ. In FIG. 14 and FIG. 
1 5 when packets are of fixed length of 100 bytes and 300 bytes per-flow, WFQ 
performed significantly better than the other two, but the fact that fairly large path 
lengths are within the 150ms bound for all the architectures does not give particular 
advantage to the per-flow approach. 

For low bandwidth applications when packets are of fixed size, per-flow WFQ is 
no better than the other applications as shown in FIG. 15, and FIG. 17. For audio, using 
large packet size of 1000 bytes, even per-flow VTFQ cannot provide the delays needed 
for real-time communications, as shown in FIG. 15. However, when the flow's packets 
are bound by 100 bytes and the maximum packet size in the network is 1000 bytes, the 
delays of per-flow WFQ are quite large and not useful (refer to FIG. 1 5). The GSAI 
architecture illustrated enhanced performance in relation to WFQ when the packets in 
the network were of different sizes. The delay in WFQ is dominated by the maximum 
packet size, whereas PKT-SP and BDT-DF through "thresholding" can remove the ill- 
effects of large packets and improve the end-to-end delays. 

It will be appreciated that SCORE generally achieves the same delays as per- 
flow WFQ. Using the threshold approach, GSAI can yield a delay reduction for low- 
bandwidth flows with long paths compared to per-flow WFQ. For shorter paths, the 
delay bounds tends to be within 150ms. Initial experiments on the method indicate that 
the delay bounds for flows are as good as, or even better in some cases, than per-flow 
WFQ. 
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4.2. Bandwidth Utilization 

Both SCORE and GSAI architectures have their sources of bandwidth loss. In 
SCORE, some link bandwidth is lost due to estimation errors, whereas in the GSAI 
architectures the loss is due to thresholding. When the network is sufficiently and 
5 uniformly loaded, the reservations easily exceed the thresholds and there is minimum 
waste of bandwidth. Thresholding has adverse effect when a link carries a large 
number of low-bandwidth flows that belong to different flow classes. In SCORE, there is 
a bandwidth loss of approximately 10-15%. 

It should be appreciated that link bandwidth usage estimates are primarily utilized 
1 # for admitting new flows, wherein call-blocking rates are a good indication of the effect of 
1 2 bandwidth losses. An experiment was conducted to measure the call-blocking rates of 
uj PKT-LS, BDT-DF, Intserv and SCORE. The call-blocking rates were not compared with 

other architectures, because they do not allow arbitrary paths to be setup and the 
3 resulting call-blocking rates are not true indications of the effects of thresholding. 
15;! The experiment was performed by generating random flow requests in a network 

r ~ and the widest-shortest algorithm was utilized to select a path between the source and 
destination. The reservation is then made on each link on the path. For PKT-LS, when 
the flow is the first flow with that destination and class, the bandwidth used is increased 
by the threshold. Similarly for BDT-DF, if the flow is the first flow of that class, the 
20 threshold bandwidth is reserved. In SCORE, the estimation algorithm is not 

implemented, but an assumption is made that 10% of the bandwidth is lost on each link. 
That is, the link bandwidth is manually reduced by 10% when performing the 
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experiment. 

FIG. 19 illustrates call-blocking rates, and it may be observed that PKT-LS and 
BDT-DF perform better than SCORE. BDT-DF performs better than PKT-LS because 
there are only Q queues in BDTDF and so at most Q times the threshold bandwidth 
5 can be lost. Whereas the lost bandwidth in PKTLS can be up to QN times the 

threshold. The worst case behavior of PKT-LS and BDT-DF will be rarely seen because 
the path-selection algorithm try to optimize and setup flows mostly along the shortest- 
path and hence a link may not carry flows of all destinations. 

:Q In SCORE, there is strong coupling between flow behavior and reservation 

10lj estimation algorithm. The estimation algorithm depends on the sending of "dummy 

j 1 packets" by the source when an individual flow rate drops below a predetermined point. 
Such unused bandwidth of real-time flows is best utilized for best-effort traffic. Because 

□ there are always inaccuracies in any characterization of a real-time flow, it is even more 

5,,Ji, 

Q imperative that the best-effort traffic utilize as much of that unused bandwidth. In GSAI, 
1 5 * the flow behaviors and reservation maintenance mechanisms are decoupled and hence 
GSAI architectures need no such dummy packets. 
4.3. State Size 

The state size in PKT-SP, PKT-MP and BDT-MP architectures is static and can 
be determined a priori from the network parameters. This is far more tractable and 
20 accessible to network engineering than state size that is a function of the dynamic 

nature of user flows. However, the state size in PKT-LS, BDT-DF and BDT-LS that use 
label-switching is not static, but depends on the arrival pattern of the flow requests. The 
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following experiments were performed in which random flow requests are generated 
and signaled as in the previous section. The maximum size of the state in the routers is 
measured and plotted, as FIG. 20 indicates, wherein the label state size is significantly 
small compared to per-flow state and also levels off very quickly. It is generally a few 
times the size of state in the static architectures, and can be easily bounded, unlike the 
per-flow routing state. 

4.4. Overhead of Refresh Mechanisms 

A qualitative comparison is made of the refresh message overhead. However, 
before proceeding to compare refresh message overhead, a few observations on 
SCORE'S use of IP header fields for encoding DPS state is in order. In SCORE, each 
packet carries information, such as Dynamic Packet State - DPS, consisting of the 
flow's reservation rate and other packet scheduling information. This extra information 
in each packet is an overhead that consumes bandwidth. To eliminate this overhead, a 
conventional technique has been suggested which encodes the DPS in the IP packet 
itself. Some of the DPS variables are stored in the 73-bit long ip~offset under the 
assumption that the field is rarely used. The assumption was justified based on the 
observation that in current internet traces only 0.22% of the packets are actually used 
the field. The IP packets that use this field are forwarded as best-effort traffic, which 
implies the delay guarantees do not hold for 022% of the packets. Despite the claim 
that ip-offset field is rarely used, current usage of the field does not indicate its future 
use, and it cannot be ruled out that this field will not be used extensively in future. 
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To avoid using separate MPLS labels for route pinning, SCORE uses a route 
pinning technique that uses only the addresses of the routers. It should be appreciated 
that the technique may fail to generate correct tables for some routes. Therefore, to 
provide correct implementation of SCORE, the DPS variables and the MPLS labels for 
5 route pinning must be encoded in separate fields. In contrast, PKT-SP, PKT-MP and 
BDT-MP requires no separate fields except for the class field for which the TOS field is 
used. 

In view of the reasons stated above, the assumption that SCORE encodes DPS 
S variables in fields outside the IP header is carried forward. The overhead of using DPS 
10j{ should then be considered, by way of example, a link of B Mbs capacity is considered. 
\ 2 Assume SCORE uses packets of size at most X bytes and each packet carries a DPS 

state of 2 bytes. When the link bandwidth is fully allocated, the total bandwidth 
Q consumed by the DPS variables is 2 x BIX Mbs. The overhead is determined by the 

5 packet size, wherein the smaller the packets greater the overhead. This provides a 

o. 

1 5 disincentive for the use of small packets. In addition, as link capacities increase from 
megabytes to gigabytes to terabytes, more and more bandwidth is consumed by the 
DPS. 

To estimate the refresh message overhead in GSAI architectures, consider a 
network of N routers. Let the refresh message be of size X which is sent every T R 
20 seconds. The bandwidth on the link consumed by the refresh messages is then at most 
N x XIT R . This is a conservative bound because the refresh message needs to be 
sent for only those destination for which the link carries traffic. This overhead does not 
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increase as link bandwidth increases, and only increases as number of destinations for 
which the link carries traffic increases. It should also be noted that label-switching 
refresh message overhead is given by V x XIT R , where V is the number of labels 

associated with the links. 
5 Processing cost: In SCORE, extra instructions are required to process each 

packet to estimate the reservation, which increases as the link capacity increases. 
Also, for a given traffic rate, as packet sizes vary the processing requirements also vary. 
This is because of the strong coupling between the estimation algorithm and the end- 
^ user driven data traffic. In GSAI, processing related to reservation maintenance is 
10p decoupled from forwarding data traffic, and since the number of refresh messages on a 
i,d link are bounded, so are the CPU cycles required to process them. 

Accordingly, it will be seen that this invention provides a family of architectures 
;3 (GSAI) that address some of the drawbacks of the Intserv architecture. The 
;:~ architectures use highly aggregated state in the routers, and yet provide the 
1 6 deterministic delay and bandwidth guarantees assured under the Intserv model. The 
aggregated state approach represents the middle ground between the stateful Intserv 
and the stateless SCORE architectures. All the GSAI architectures eliminate the need 
for per-flow state maintenance in the routers and are far more scalable than the Intserv 
model. Qualitative and quantitative comparisons are made between GSAI framework 
20 and the other two architectures. 

The GSAI architectures are divided into two subfamilies. One family uses 
aggregate classes defined using the notion of burst-drain-time. The class definition is 
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powerful and enables aggregation of a large number of flows into a small fixed number 
of queues and performs class-based scheduling. In the class-less family when 
introduced the device called shaper-battery which merges and shapes flows to I -burst 
flows. Within each subfamily, the use of destination-oriented routing and label-switching 
5 with flow aggregation where demonstrated. It is shown that GSAI provides a significant 
improvement over the use of Intserv, 

Although the description above contains many specificities, these should not be 
construed as limiting the scope of the invention but as merely providing illustrations of 
:jj some of the presently preferred embodiments of this invention. Therefore, it will be 
1 04 appreciated that the scope of the present invention fully encompasses other 
' { embodiments which may become obvious to those skilled in the art, and that the scope 
m of the present invention is accordingly to be limited by nothing other than the appended 
□ claims, in which reference to an element in the singular is not intended to mean "one 
p and only one" unless explicitly so stated, but rather "one or more." All structural, 
1 chemical, and functional equivalents to the elements of the above-described preferred 
embodiment that are known to those of ordinary skill in the art are expressly 
incorporated herein by reference and are intended to be encompassed by the present 
claims. Moreover, it is not necessary for a device or method to address each and every 
problem sought to be solved by the present invention, for it to be encompassed by the 
20 present claims. Furthermore, no element, component, or method step in the present 
disclosure is intended to be dedicated to the public regardless of whether the element, 
component, or method step is explicitly recited in the claims. No claim element herein is 
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to be construed under the provisions of 35 U.S.C. 112, sixth paragraph, unless the 
element is expressly recited using the phrase "means for." 
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Table 1 

Reference Table of Architectural Features 





Class Type 


Scheduling 


Routing 


Delay Field 


Label Field 


PKT-SP 


PKT 


Per-dest-class 


single path 


No 


No 


PKT-MP 


PKT 


Per-dest-class 


multipaths 


No 


No 


PKT-LS 


PKT 


Per-label-class 


labeled-paths 


No 


Yes 


BDT-DF 


BDT 


Per-class 


labeled-paths 


Yes 


Yes 


BDT-LS 


BDT 


Per-label-class 


labeled-paths 


No 


Yes 


BDT-MP 


BDT 


Per-dest-class 


multipaths 


No 


No 



UC00-303-2 



53 



EL645676937US 



